fastdup By Visual-Layer
fastdup is a tool for gaining insights from large image/video collections. It can find anomalies, duplicate and near duplicate images/videos, clusters of similarity, learn the normal behavior and temporal interactions between images/videos. It can be used for smart subsampling of a higher quality dataset, outlier removal, and novelty detection for new information to be sent for tagging. Just 2 lines of code to get you started:todo: fastdup clip
fastdup is:
Unsupervised: fits any datasetScalable : handles 400M images on a single machine
Efficient: works on CPU only Low Cost: can process 12M images on a $1 cloud machine budget works on CPU only From the authors of GraphLab and Turi Create.
Quick installation
- Python 3.7, 3.8, 3.9
- Supported OS: Ubuntu 20.04, Ubuntu 18.04, Debian 10, Mac OSX M1, Mac OSX Intel, Windows 10 Server.
For Windows, CentOS 7.X, RedHat 4.8 and other older Linux see our Insallation instructions.
What’s new in V1.0?
- Better support for labels
- Better galleries
- A new Python API
Running the code
Existing API is fully supportedGetting started examples
- 🔥 Finding duplicates, outliers and connected components in the Food-101 dataset, including Tensorboard Projector visualization - Google Colab
- 🔥🔥 Visualizing and understanding a new dataset, looking at dats outliers and label outliers, Training a baseline KNN classifier and getting to accuracy of 0.99 by removing confusing labels
- Finding wrong lables via image similarity
- Computing image statistics
- Using your own onnx model for extraction
- Getting started on a Kaggle dataset
- Deduplication of videos - Google Colab
- Analyzing video of the MEVA dataset - Google Colab
- Working with multipe labels per image
Detailed instructions
- Detailed instructions, install from stable release and installation issues
- Detailed running instructions
User community contributions
Stroke AIS DataTire Data
Butterfly Mimics
Drugs and Vitamins
Plastic Bottles
Micro Organisms
PCB Boards
ZebraFish
Whats the difference